首页> 外文OA文献 >Accelerating finite-rate chemical kinetics with coprocessors: comparing vectorization methods on GPUs, MICs, and CPUs

【2h】

Accelerating finite-rate chemical kinetics with coprocessors: comparing vectorization methods on GPUs, MICs, and CPUs

机译：用协处理器加速有限速率化学动力学：比较 GpU，mIC和CpU上的矢量化方法

代理获取

本网站仅为用户提供外文OA文献查询和代理获取服务，本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文，但由于OA文献来源多样且变更频繁，仍可能出现获取不到、文献不完整或与标题不符等情况，如果获取不到我们将提供退款服务。请知悉。

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

Efficient ordinary differential equation solvers for chemical kinetics musttake into account the available thread and instruction-level parallelism of theunderlying hardware, especially on many-core coprocessors, as well as thenumerical efficiency. A stiff Rosenbrock and nonstiff Runge-Kutta solver areimplemented using the single instruction, multiple thread (SIMT) and singleinstruction, multiple data (SIMD) paradigms with OpenCL. The performances ofthese parallel implementations were measured with three chemical kinetic modelsacross several multicore and many-core platforms. Two runtime benchmarks wereconducted to clearly determine any performance advantage offered by eithermethod: evaluating the right-hand-side source terms in parallel, andintegrating a series of constant-pressure homogeneous reactors using theRosenbrock and Runge-Kutta solvers. The right-hand-side evaluations with SIMDparallelism on the host multicore Xeon CPU and many-core Xeon Phi co-processorperformed approximately three times faster than the baseline multithreadedcode. The SIMT model on the host and Phi was 13-35% slower than the baselinewhile the SIMT model on the GPU provided approximately the same performance asthe SIMD model on the Phi. The runtimes for both ODE solvers decreased 2.5-2.7xwith the SIMD implementations on the host CPU and 4.7-4.9x with the Xeon Phicoprocessor compared to the baseline parallel code. The SIMT implementations onthe GPU ran 1.4-1.6 times faster than the baseline multithreaded CPU code;however, this was significantly slower than the SIMD versions on the host CPUor the Xeon Phi. The performance difference between the three platforms wasattributed to thread divergence caused by the adaptive step-sizes within theODE integrators. Analysis showed that the wider vector width of the GPU incursa higher level of divergence than the narrower Sandy Bridge or Xeon Phi.

机译：对于化学动力学而言，有效的普通微分方程求解器必须考虑到底层硬件（特别是在多核协处理器上）的可用线程和指令级并行性以及数值效率。使用OpenCL的单指令多线程（SIMT）和单指令多数据（SIMD）范例实现了刚性Rosenbrock和非刚性Runge-Kutta解算器。这些并行实现的性能是通过跨多个多核和多核平台的三个化学动力学模型测得的。进行了两个运行时基准测试，以明确确定这两种方法所提供的任何性能优势：并行评估右侧源条件，以及使用Rosenbrock和Runge-Kutta求解器集成一系列恒压均相反应器。在主机多核Xeon CPU和多核Xeon Phi协处理器上使用SIMDparallelism进行右侧评估的速度比基准多线程代码快约三倍。主机和Phi上的SIMT模型比基线慢13-35％，而GPU上的SIMT模型提供的性能与Phi上的SIMD模型大致相同。与基线并行代码相比，使用主机CPU上的SIMD实现时，两个ODE求解器的运行时间都减少了2.5-2.7倍，使用Xeon Phicoprocessor时则减少了4.7-4.9倍。 GPU上的SIMT实现比基线多线程CPU代码快1.4-1.6倍；但是，这比主机CPU或至强融核上的SIMD版本慢得多。这三个平台之间的性能差异归因于ODE集成器内自适应步长导致的线程分歧。分析表明，较窄的Sandy Bridge或Xeon Phi，GPU的矢量宽度较宽，导致较高的发散度。

著录项

作者
Stone, Christopher P.; Alferman, Andrew T.; Niemeyer, Kyle E.;
展开▼
作者单位

展开▼
年度 2017
总页数
原文格式 PDF
正文语种
中图分类

相似文献

外文文献
中文文献
专利

1. Accelerating finite-rate chemical kinetics with coprocessors: Comparing vectorization methods on GPUs, MICs, and CPUs [J] . Stone Christopher P., Alferman Andrew T., Niemeyer Kyle E. Computer physics communications . 2018,第期

机译：用协处理器加速有限速率化学动力学：比较VPU，MIC和CPU的矢量化方法
2. Semiempirical Quantum Chemical Calculations Accelerated on a Hybrid Multicore CPU-GPU Computing Platform [J] . Xin Wu, Axel Koslowski, Walter Thiel Journal of chemical theory and computation: JCTC . 2012,第7期

机译：在混合多核CPU-GPU计算平台上加速半经验量子化学计算
3. Accelerating moderately stiff chemical kinetics in reactive-?ow simulations using GPUs [J] . Kyle E. Niemeyer, Chih-Jen Sung Journal of Computational Physics . 2014,第Null期

机译：使用GPU在反应流模拟中加速中等刚度的化学动力学
4. GPU-accelerated Software Library for Unsteady Flamelet Modeling of Turbulent Combustion with Complex Chemical Kinetics [C] . Ramanan Sankaran AIAA aerospace sciences meeting including the new horizons forum and aerospace exposition . 2013

机译：GPU加速的软件库，用于复杂化学动力学的湍流燃烧非稳态小火焰建模
5. Algorithmic and software system support to accelerate data processing in CPU-GPU hybrid computing environments. [D] . Wang, Kaibo. 2015

机译：算法和软件系统支持可加速CPU-GPU混合计算环境中的数据处理。
6. Scaling methods for accelerating kinetic Monte Carlo simulations of chemical reaction networks [O] . Yen Ting Lin, Song Feng, William S. Hlavacek -1

机译：加速化学反应网络动力学Monte Carlo模拟的缩放方法
7. Accelerating finite-rate chemical kinetics with coprocessors: comparing vectorization methods on GPUs, MICs, and CPUs [O] . Stone, Christopher P., Alferman, Andrew T., Niemeyer, Kyle E. 2017

机译：用协处理器加速有限速率化学动力学：比较 GpU，mIC和CpU上的矢量化方法
8. Block-Iterative Methods for 3D Constant- Coefficient Stencils on GPUs and Multicore CPUs. [R] . Rodriguez, M., Philip, B., Wang, Z., 2014

机译：GpU和多核CpU上3D恒定系数模板的块迭代方法。

Accelerating finite-rate chemical kinetics with coprocessors: comparing vectorization methods on GPUs, MICs, and CPUs

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅